Skip to content
This repository was archived by the owner on Apr 12, 2024. It is now read-only.

Conversation

@caitp
Copy link
Contributor

@caitp caitp commented Jul 16, 2014

ngSanitize will now permit opening braces in text content, provided they
are not followed by either an unescaped backslash, or by an ASCII letter
(u+0041 - u+005A, u+0061 - u+007A), in compliance with rules of the parsing
spec, without taking insertion mode into account.

BREAKING CHANGE

Previously, $sanitize would "fix" invalid markup in which a space preceded
alphanumeric characters in a start-tag. Following this change, any opening
angle bracket which is not followed by either a forward slash, or by an
ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per
the HTML parsing spec
(http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html).

Closes #8193

sylvain-hamel and others added 2 commits July 15, 2014 13:05
… text content

ngSanitize will now permit opening braces in text content, provided they are not followed by either
an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with
rules of the parsing spec, without taking insertion mode into account.

BREAKING CHANGE

Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters
in a start-tag. Following this change, any opening angle bracket which is not followed by either a
forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per
the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @IgorMinar PTAL --- This particular block is only here to make sure that we throw if we find an apparent start-tag without a trailing >

This might not be the right thing to do --- if we don't have a trailing >, we could potentially just treat it as a text node. I'm not sure what the best thing to do in this case is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to treat as a text node. IMO the sanitizer should be secure but tolerant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this really a bad text string? I would let it go as a text block. For instance:

In my math project I found that a<b when b=10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as HTML parsing is concerned, /</[a-zA-Z/ is the start of a tag, so we shouldn't "fix" this, I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although arguably we are not trying to "parse" html here, only sanitize text that may be inadvertently parsed by a browser later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is right. we shouldn't try to fix broken html.

@petebacondarwin
Copy link
Contributor

Other than that LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this < be encoded just to be safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is encoded in the real world, however in the test, the chars handler just appends the value to a string

@IgorMinar
Copy link
Contributor

LGTM except for the one test where < is not encoded. how is it different from the test on line 87?

@caitp
Copy link
Contributor Author

caitp commented Jul 16, 2014

We're passing a handler to htmlParser() which just appends the parsed text to a string, to assert that a certain substring was treated as text content. That's why it's not encoded here. In the case of expectHTML(), where we're using the real parser, the value is encoded because of the handler used by $sanitize

@IgorMinar
Copy link
Contributor

I see. Thanks for the explanation. LGTM then.

@petebacondarwin
Copy link
Contributor

I still don't think that text containing a some b<a thing should throw an exception. It should just be sanitized to some b&lt;a thing

@caitp caitp closed this in f6681d4 Jul 16, 2014
@caitp
Copy link
Contributor Author

caitp commented Jul 16, 2014

@petebacondarwin maybe we should see how people react. I agree that it kind of sucks

caitp added a commit that referenced this pull request Jul 16, 2014
… text content

ngSanitize will now permit opening braces in text content, provided they are not followed by either
an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with
rules of the parsing spec, without taking insertion mode into account.

BREAKING CHANGE

Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters
in a start-tag. Following this change, any opening angle bracket which is not followed by either a
forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per
the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html).

Closes #8212
Closes #8193
caitp added a commit to caitp/angular.js that referenced this pull request Jul 16, 2014
… text content

ngSanitize will now permit opening braces in text content, provided they are not followed by either
an unescaped backslash, or by an ASCII letter (u+0041 - u+005A, u+0061 - u+007A), in compliance with
rules of the parsing spec, without taking insertion mode into account.

BREAKING CHANGE

Previously, $sanitize would "fix" invalid markup in which a space preceded alphanumeric characters
in a start-tag. Following this change, any opening angle bracket which is not followed by either a
forward slash, or by an ASCII letter (a-z | A-Z) will not be considered a start tag delimiter, per
the HTML parsing spec (http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html).

Closes angular#8212
Closes angular#8193
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants